Automated Diagnosis for Electronic Systems

ABSTRACT

Systems for providing automated diagnosis of problems for an electronic network include a central diagnosis engine configured to include modules that rank identified policy/configuration changes into potential causes, verify the ranked potential causes and determine whether any of the ranked potential causes is a likely cause or contributor to the problem. An estimator module is configured to calculate distances associated with the ranked potential causes such that a list of potential causes of the problem can be presented in order of likelihood. Other systems and methods are also provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application is a continuation of co-pending U.S. patentapplication Ser. No. 10/611,634, entitled “Automated Diagnosis forComputer Networks,” filed Jun. 30, 2003, which is herein incorporated byreference in its entirety.

TECHNICAL FIELD

The present invention is generally related to computer systems and, moreparticularly, is related to providing customer support for security andmanagement of complex networks.

BACKGROUND OF THE INVENTION

Communications networks and systems have become increasing complex.Sophisticated networks typically include a large number of elements orsystem components. Management of these networks often requiresinformation and knowledge about these elements or combinations ofelements such that problems can be readily resolved.

System administrators who manage these complex networks are expectedprovide a highly reliable and secure system. To provide a continuouslyhigh level of service, the system administrators have become problemsolvers. The system administrators are often expected to respond to andunderstand the impact of changes to the network made by both authorizedpersonnel and intruders, such that they can avoid or resolve problems ina timely fashion. Thus, system administrators should be knowledgeableabout elements, rules, characteristics and other data related to theoperation, management and control of the network. As networks continueto grow in size and complexity, it is becoming increasingly difficultfor system administrators to remain informed of the operation of eachelement of the network using existing tools available to manage theelements. Further, the ability of system administrators to identifycauses of problems is likewise diminished by the sheer complexityinvolved. However, having access to such information and being able toquickly identify problem causes have become pre-requisites for timelyresolution of problems that arise in complex modern networks.

Thus, heretofore-unaddressed needs exist for a solution that addressesthe aforementioned deficiencies and inadequacies.

SUMMARY OF THE INVENTION

Preferred embodiments of the present invention provide a system andmethod for automated diagnosis of security and reliability problems forelectronic systems, such as profile or policy enabled systems.

Briefly described, in architecture, one embodiment of the system, amongothers, can be implemented to include a central diagnosis engineconfigured to include a rank estimator module that ranks identifiedchanges into potential causes, a verifier module configured to verifythe ranked potential causes and to determine whether any of the rankedpotential causes may be an actual cause or contributor to the problem,and a distance estimator module configured to calculate distancesassociated with the ranked potential causes such that a list of likelypotential causes of the problem can be presented. An adaptive logger isoperatively coupled to the central diagnosis engine and is configured torecord configuration changes made to the electronic system that fallwithin pre-established parameters such that a possible cause of theproblem can be identified.

Preferred embodiments of the present invention can also be viewed asproviding methods for the automated diagnosis of security andreliability problems for electronic profile or policy enabled systems.In this regard, one embodiment of such a method, among others, can bebroadly summarized by the following steps: identifying recentconfiguration changes made to the electronic system that fall withinpre-established parameters; ranking the identified changes intopotential causes; verifying ranked potential causes to determine whetherany of the ranked potential causes may be an actual cause or contributorto the problem; and calculating distances associated with the rankedpotential causes to help determine the actual likelihood that one ormore of them are the true cause.

Other systems, methods, features, and advantages of the presentinvention will be or become apparent to one with skill in the art uponexamination of the following drawings and detailed description. It isintended that all such additional systems, methods, features, andadvantages be included within this description and be within the scopeof the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

Many aspects of the invention can be better understood with reference tothe following drawings. The components in the drawings are notnecessarily to scale, emphasis instead being placed upon clearlyillustrating the principles of the present invention. Moreover, in thedrawings, like reference numerals designate corresponding partsthroughout the several views.

FIG. 1 is a block diagram depicting a preferred embodiment of a systemfor providing automated diagnosis of security and reliability problemsfor electronic profile or policy enabled systems.

FIG. 2 is a block diagram depicting a more detailed illustrative exampleof a preferred embodiment of a system for providing automated diagnosisof security and reliability problems for electronic profile or policyenabled systems.

FIG. 3 is a block diagram of an illustrative example of a preferredembodiment of modules of a central diagnosis engine of a system forproviding automated diagnosis of security and reliability problems forelectronic profile or policy enabled systems.

FIG. 4 is a block diagram of an illustrative example of a preferredembodiment of a hierarchical vulnerability database structure of asystem for automated diagnosis of security and reliability problems forelectronic profile or policy enabled systems.

FIGS. 5A and 5B are flowcharts depicting functionality, in accordancewith one preferred embodiment, of an example of utilizing animplementation of automated diagnosis of security and reliabilityproblems for electronic profile or policy enabled systems.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Disclosed herein are systems and methods for the automated diagnosis ofsecurity and reliability problems for electronic profile or policyenabled systems. To facilitate description of the inventive system, anexample system that can be used to implement the automated diagnosis ofsecurity and reliability problems for electronic profile or policyenabled systems is discussed with reference to the figures. Althoughthis system is described in detail, it will be appreciated that thissystem is provided for purposes of illustration only and that variousmodifications are feasible without departing from the inventive concept.After the example system has been described, an example of operation ofthe system will be provided to explain the manner in which the systemcan be used to provide the automated diagnosis of security andreliability problems for electronic profile or policy enabled systems.

Referring now in more detail to the drawings, in which like numeralsindicate corresponding parts throughout the several views, FIG. 1 is ablock diagram depicting a preferred embodiment of a system 100 forautomated diagnosis of security and reliability problems for electronicprofile or policy enabled systems. By way of explanation, policy refersto system configuration information. Changing the operation of thesystem (or a part of the system) involves modifying policy or profileinformation. Profiles are used to divide entities into different groupsor categories to reduce the information and effort needed to manage thenetwork or system. For example, in a network, routers might be dividedinto edge, intermediate and core routers (i.e., three router profiles).Each profile has a different policy definition defining membership.Within the policy for that network, different rules and configuration isdelineated for each group. Thus, if a particular router is a corerouter, a portion of its default configuration is defined as part of a“core router” configuration and is identical to other core router, thusreducing the amount of policy information needed for the profile of corerouters.

The system 100 includes a user processing device 102, a provider network104, a computing device 108 that depicts an illustrative example of animplementation of automated diagnosis of security and reliabilityproblems for electronic profile or policy enabled systems includes,among others, logic configured to provide automated diagnosis ofsecurity and reliability problems, and a plurality of databases 112,114. In one preferred embodiment, information stored in databases 112,114 is organized as field, records, or files, etc. In another preferredembodiment, the databases 112, 114 are accessible to the digitalcomputer 108 via a system I/O interface 126. In yet another preferredembodiment, the digital computer 108 is configured to include thedatabases 112, 114 in memory. In still another preferred embodiment, thedatabases reside on a storage server (not shown) accessible by thedigital computer 108.

The provider network 104 may be any type of communications networkemploying any network topology, transmission medium, or networkprotocol. For example, such a network may be any public or privatepacket-switched or other data network, including the Internet,circuit-switched network, such as a public switch telecommunicationsnetwork (PSTN), wireless network, or any other desired communicationsinfrastructure and/or combination of infrastructure. Alternatively, theuser could interact directly with the computing device 108 instead ofvia the provider network 104 and the user processing device 102.

Generally, in terms of hardware architecture, as shown in FIG. 1, thedigital computer 108 includes, inter alia, a processor 120 and memory122. Input and/or output (I/O) devices (or peripherals) can becommunicatively coupled to a local interface 124 via a system I/Ointerface 126, or directly connected to the local interface 124. Thelocal interface 124 can be, for example but not limited to, one or morebuses or other wired or wireless connections, as is known in the art.The local interface 124 may have additional elements, which are omittedfor simplicity, such as controllers, buffers (caches), drivers,repeaters, and receivers, to enable communications. Further, the localinterface may include address, control, and/or data connections toenable appropriate communications among the aforementioned components.

The processor 120 is a hardware device for executing software,particularly that stored in memory 122. The processor 120 can be anycustom made or commercially available processor, a central processingunit (CPU), an auxiliary processor among several processors, asemiconductor based microprocessor (in the form of a microchip or chipset), a macroprocessor, or generally any device for executing softwareinstructions.

The memory 122 can include any one or combination of volatile memoryelements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM,etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape,CDROM, etc.). Moreover, the memory 122 may incorporate electronic,magnetic, optical, and/or other types of storage media. Note that thememory 122 can have a distributed architecture, where various componentsare situated remote from one another, but can be accessed by theprocessor 120.

The software and/or firmware in memory 122 may include one or moreseparate programs, each of which comprises an ordered listing ofexecutable instructions for implementing logical functions. In theexample of FIG. 1, the software in the memory 122 can include automateddiagnosis of security and reliability problems logic 130, and a suitableoperating system (O/S) 128. The operating system essentially controlsthe execution of other computer programs, and provides scheduling,input-output control, file and data management, memory management, andcommunication control and related services.

The logic 130 is a source program, executable program (object code),script, or any other entity comprising a set of instructions to beperformed. When the logic 130 is implemented as a source program, thenthe program needs to be translated via a compiler, assembler,interpreter, or the like, which may or may not be included within thememory 122, so as to operate properly in connection with the O/S.Furthermore, logic 130 can be written as (a) an object orientedprogramming language, which has classes of data and methods, or (b) aprocedure programming language, which has routines, subroutines, and/orfunctions, for example but not limited to, C, C++, Pascal, Basic,Fortran, Cobol, Perl, Java, and Ada.

The I/O devices may include input devices, for example but not limitedto, a keyboard, mouse, scanner, microphone, etc. Furthermore, the I/Odevices may also include output devices, for example but not limited to,a printer, display, etc. The I/O devices may further include devicesthat communicate both inputs and outputs, for instance but not limitedto, a modulator/demodulator (modem; for accessing another device,system, or network), a radio frequency (RF) or other transceiver, atelephonic interface, a bridge, a router, etc. Finally, I/O 126 maycouple to the provider network 104 that is configured to communicatewith the user processing device 102.

When the logic 130 is implemented in software, as is shown in FIG. 1, itshould be noted that logic 130 can be stored on any computer-readablemedium for use by or in connection with any computer related system ormethod. The logic 130 can be embodied in any computer-readable mediumfor use by or in connection with an instruction execution system,apparatus, or device, such as a computer-based system,processor-containing system, or other system that can fetch theinstructions from the instruction execution system, apparatus, or deviceand execute the instructions. In the context of this document, a“computer-readable medium” can be any means that can store, communicate,propagate, or transport the program for use by or in connection with theinstruction execution system, apparatus, or device. Thecomputer-readable medium can be, for example but not limited to, anelectronic, magnetic, optical, electromagnetic, infrared, orsemiconductor system, apparatus, device, or propagation medium. Morespecific examples (a nonexhaustive list) of the computer-readable mediumwould include the following: an electrical connection (electronic)having one or more wires, a portable computer diskette (magnetic), arandom access memory (RAM) (electronic), a read-only memory (ROM)(electronic), an erasable programmable read-only memory (EPROM, EEPROM,or Flash memory) (electronic), an optical fiber (optical), and aportable compact disc read-only memory (CDROM) (optical). Note that thecomputer-readable medium could even be paper or another suitable mediumupon which the program is printed, as the program can be electronicallycaptured, via for instance optical scanning of the paper or othermedium, then compiled, interpreted or otherwise processed in a suitablemanner if necessary, and then stored in a computer memory.

In an alternative embodiment, where the logic 130 is implemented inhardware, the logic 130 can be implemented with any or a combination ofthe following technologies, which are each well known in the art: adiscrete logic circuit(s) having logic gates for implementing logicfunctions upon data signals, an application specific integrated circuit(ASIC) having appropriate combinational logic gates, a programmable gatearray(s) (PGA), a field programmable gate array (FPGA), etc.

FIG. 2 is a block diagram depicting a more detailed illustrative exampleof a preferred embodiment of a system 200 for providing automateddiagnosis of security and reliability problems for electronic profile orpolicy enabled systems. The system 200 includes the computing device 108that communicates with the user processing device 102, provider network104, and databases 112, 114 configured as an index database (EDD) 210and a hierarchical vulnerability database (HVD) 212. The computingdevice 108 further includes memory 122 having operating system 128 andlogic 130 configured as a central diagnosis engine 206, presentationmodule 204 and database interface module 208. Further, computing device108 includes local interface 124, processor 120, network interface card214 and system interfaces 126, 126A. In an example, the user processingdevice 102 communicates with the computing device 108 via the I/O 126A.In another preferred embodiment, the user processing device 102communicates with the computing device 108 via the provider network 104.In a preferred embodiment, the network interface card 214, I/O 126, anddatabase interface modules 208 are utilized for communicating betweenthe provider network 104 and the databases 210, 212.

The central diagnosis engine (CDE) 206 provides an interface andalgorithmic intelligence between the user processing device 102,presentation module 204 and the databases 210, 212 via the databaseinterface module 208. In a preferred embodiment, the CDE 206 isconfigured to receive user input describing or relating to a security orreliability problem, to determine possible causes of the problem, and toaccess the databases 210, 212 to verify the result. Preferably, the CDE206 accesses the EDD 210 for customer records and cycles through the HVD212 starting at a general level for the element on which the problemoccurred or was noted, then branching down through the levels usingpolicy-based element-descriptive information in an attempt to verifythat a lowest level database page reached contains vulnerability datathat corresponds to the problem encountered. After verification andfinal ranking in terms of the likelihood that potential causes areactual causes, the results are accumulated and made available to thepresentation module 204 and/or the user's processing device 102.

The presentation module 204 summarizes and formats the accumulatedresults, for instance, potential causes of a problem and associatedrankings or likelihood, in an appropriate manner to be informative anduseful to a user. In one preferred embodiment, the presentation module204 utilizes software engineering to accomplish the presentation ofaccumulated vulnerability results to the user. For example, applicationprogramming interfaces can be utilized that are consistent with theuser's operating system such as Unix, Linux, Windows, etc., withspecific configurations being dependent upon the user's particularimplementation.

The database interface module 208 provides standard functionalityutilizing, for instance, a structured query language to enableprovisioning and access of the databases, EDD 210 and HVD 212. In analternative preferred embodiment, an additional interface, such as aprovisioning interface can be provided which provides for provisioningof the databases.

In a preferred embodiment, the HVD 212 is pre-provisioned withdescriptive and element data such that correct results can be achieved.Preferably, data in the HVD 212 is arranged hierarchically and includesa plurality of database pages (shown in FIG. 4) having a page index,data section and selector section. The HVD 212 is preferably organizedin a database structure of HVD pages as a range of information or as acontinuum into a set of discrete stages that allow for repeated input,via the sort of questions that an expert would typically ask at eachstage.

For example, a top section of the HVD structure includes informationnecessary to answer broad or general questions and/or symptoms thatwould naturally occur with customer inquiries. An inquiry to a bottomsection of the HVD structure results in specific helpful informationthat (i) answers the inquiry and/or (ii) provides specific advice forremedial action. Intermediate sections of the HVD structure arepreferably pre-provisioned with information and prompting questions thatleads the user from a top HVD page to the desired bottom page(s), andallows for branching to related HVD pages as needed to identify allassociated helpful information.

In a preferred embodiment, the EDD 210 includes customer records and anyother pertinent customer information.

FIG. 3 is a block diagram of an illustrative example of a preferredembodiment of modules of a central diagnosis engine 206 of a system forautomated diagnosis of security and reliability problems for electronicprofile or policy enabled systems. In a preferred embodiment, thecentral diagnosis engine 206 includes a possible cause accumulatormodule 302 that couples to the presentation module 204, distanceestimator module 304, verifier module 306, and rank estimator module308, a problem accumulator module 310 that couples to an inputparser/filter module 312 and a cause estimator module 314, and a policyinterpreter module 316 that couples to the verifier module 306 (andoptionally to a policy-management system 320 via the network 104). Theverifier module 306 is also coupled to the database interface module208. An adaptive logger 318 couples to a policy-based management system320 either directly or via the provider network 104.

The input parser/filter module 312 receives security or reliabilityproblem description input from a user's processing device 102 (or anadministrator) in a plurality of formats, such as data files of anacceptable format, or other input either automatically provided bymonitoring, sensor or management, or manually in response to promptingfrom the presentation module 204, among others. In one preferredembodiment, the input parser/filter module 312 utilizes standardsoftware engineering techniques to convert the input data into datausable by the problem accumulator module 310. The input parser/filtermodule 312 preferably interacts with the user's processing device 102via application programming interfaces that are consistent with theuser's operating system, for instance, Unix, Linux, windows, etc., withthe details of the interfaces being dependent upon the specificimplementation including the choice of software language and design. Ina preferred embodiment, the implementation is selected to perform thespecific conversions needed for each allowed input type. During theconversion process, the input parser/filter module 312 filters outextraneous data, such that only pertinent input remains. In analternative embodiment, sensor and/or monitoring systems 322 from whichthe input parser/filter module 312 could receive security problem dataincludes, firewalls, security or reliability related sensors, othermonitoring sensors, other monitoring devices, and intrusion detectionsystems (collectively referred to as IDS). IDSs in particular aretypically designed to provide alarms and alerts, with associatedelectronic messages, when detecting security problems or attacks inprogress and thus are suitable inputs to the input parser/filter module312.

In a preferred embodiment, the problem accumulator module 310 receivesproblem descriptive data from the input parser/filter module 312 andcycles, continuing to receive data until the problem is fully described.The problem can be fully described, for instance, by a user finishinginputting data or the completion of the automatic transfer ofinformation from a sensor or monitoring 322. In a preferred embodiment,the problem accumulator module 310 provides the completed set of problemdescriptive data to the cause estimator module 314.

In a preferred embodiment, the cause estimator module 314 interfaceswith the adaptive logger 318 to identify any changes in policy, (i.e.,by which overall policy controlled system is modified), associated withthe available parameters of the problem described. These parameters mayinclude such things as the piece of equipment via which or in which theproblem is noted, the system components that the problem is affecting,the time at which the problem was first noted, among others. Preferably,the cause estimator module 314 attempts to discover, via the adaptivelogger 318 any policy changes which may have caused or contributed tocausing the problem. Policy changes associated with system components“close to” but not directly coinciding with the problem parameters arealso obtained. Subsequently, the cause estimator module 314 passes allthe discovered policy changes to the rank estimator module 308, whichranks the discovered policy changes via their “closeness” to the exactproblem parameters.

In a preferred embodiment, the adaptive logger 318 interfaces with asystem's policy management equipment (or device, application or system)such as management system 320, such that the adaptive logger 318 canaccess system policy and policy changes. In a preferred embodiment,electronic links from the management system 320 provides for inputs,modifications, deletions and monitoring of a policy controlled system.An example of a policy controlled system includes a communicationsnetwork that includes switches, routers, computers, sensors, monitors,interfaces, etc. that are controlled by effective use of configurationinformation. In an example, the adaptive logger 318 and inputparser/filter module 312 are operatively coupled to the managementsystem 320, device or other suitable component to obtain, copy or recordthe policy information, such as routing configurations, and changes topolicy, such as logged configuration changes. The electronic linksbetween these entities could be accomplished via any method utilizingany known standard and/or proprietary interfaces, communicationsnetworks, interconnection media, network, design, or protocol.Preferably, secure methods such as SSL, SSH, or IPsec would be utilizedto provide appropriate security protection.

The adaptive logger 318 also interfaces with the cause estimator module314. In an example, the adaptive logger 318 records, for example, logpolicy changes that occur, thus being termed a “logger.” The adaptivelogger 318 may adjust its recording in an adaptive manner by forexample, modifying the focus or granularity, i.e., level of detail, inresponse to problem encountered. As an example, if security orreliability problems are encountered with router interfaceconfiguration, the adaptive logger 318 records greater detail thannormally recorded in response to noting these problems. This providesfor more effectively dealing with similar problems in the future. In apreferred embodiment, after a configurable time period, such as a weekor month, the recording granularity of the adaptive logger 318 couldrevert to its normal setting.

In a preferred embodiment, the rank estimator module 308 receives a setof possible causes or policy changes from the cause estimator module 314along with any associated difference in available parameters for whichthe rank estimator module 308 has relevant data. In an example, for theparameter of time, this would be the difference between the time eachpolicy change occurred or is received and the exact time of the problemitself (or the time when it was noted). The differences between the timethe problem occurred (or was noted by, for example, a user, anadministrator, or a monitoring system or sensor system) would cause thepotential causes that are closest in time to the problem occurrence tobe ranked highest, i.e., designated as more likely to be the actualcause of the problem or more likely to be a contributing factor. In apreferred embodiment, the rank estimator module 308 performs rankingusing a subset of available parameters for which it has parameterdifferences, and subsequently passes on the potential causes andassociated rankings to the possible cause accumulator module 302.Preferably, the rank estimator module 308 passes the potential causesand ranking information to the possible cause accumulator module 302 oneat a time.

In an example, a ranking process continuously utilizes a chosen scale(e.g., 0 to 100) for numerical simplicity and consistency. As policychanges are identified, certain parameters associated with the problemdefinition can be used to help rank the policy change (potential cause)in order of “closeness” to the problem. Parameters include time,equipment, sub-equipment (e.g., application residing on a particularpiece of equipment), proximity, etc. Of these, the rank estimator module308 preferably has knowledge of data and logic relating to time,equipment, and sub-equipment. In some embodiments, the rank estimatormodule 308 will not have proximity information since that information ispreferably resident in the EDD 210 and not directly available to therank estimator 308. Thus, the rank estimator module 308 includesinformation on the time of each policy change (potential cause) and thepiece of equipment or sub-equipment for which that change was made.Therefore, ranking preferably utilizes each of these parameters.

In an example, regarding the time of the policy change, policy changesoccurring subsequent to the problem time (such as the time the problemwas noted or reported) are eliminated. Regarding equipment,sub-equipment or element or sub-element of the overall policy controlledsystem, policy changes occurring on the same piece of equipment as theproblem are ranked highest. Further, certain types of equipment areinherently more security sensitive or reliability sensitive for certaintypes of problems noted, such that policy changes occurring on thesetypes of equipment will typically be ranked higher than others for thosetypes of problems. Other rules like these or variations of these rulescan be utilized to refine the ranking process. For instance, a set ofif/then rules can be used to calculate specific sensitivities (forexample, “for a certain problem type, consult a list of equipment typesgiving their respective sensitivities on a 0 to 100 scale”). Preferably,these rules, and the quantitative sensitivities are configurable by theuser or administrator. In some embodiments, default rules andsensitivities are provided.

The ranking process seeks to determine the difference between eachpolicy change parameter and the associated problem parameter. Forinstance, the difference in time values between a policy change and theproblem occurrence is of value in ranking. Potential causes (policychanges) closer in time to the problem (i.e., having the lowestdifference) are ranked higher than others.

Regarding whether the problem occurred on the same piece of equipment asthe policy change, the ranking process performed by the rank estimatormodule 308 determines if the policy change and problem are co-located,and if so, assigns a difference of zero (otherwise a large difference isassigned, e.g. 50). Regarding different types of equipment, type valuesare used with an assumed problem parameter value of 100. For example, ifthe policy change occurs on an element that is highly securitysensitive, for example a firewall or intrusion detection system, a hightype value is assigned (e.g., 75). A less security sensitive element,for example a router or Ethernet switch, is assigned a low value (e.g.,10). For equipment between these two levels of security, for instance, aserver, an intermediate type value is assigned (e.g., 40). Type valuescan be assigned for all types of equipment and sub-equipment thatreflect established security knowledge and expertise. Similar values canbe assigned for reliability components. In an example, for each policychange (or potential cause) the actual parameter difference is theassumed problem parameter value of 100 minus the associated type value.

Once the difference between each policy change parameter and the problemparameter is determined, multiplicative weightings can be applied if itis desired to magnify the effects of any particular type of differences,for instance time. Subsequently, the differences (or weighteddifferences) are summed to calculate a combined difference over theavailable parameters for that policy change (potential cause). When thecombined differences for each potential cause are available, thepotential causes can be placed in order (ranked) from lowest differenceto highest difference. This ordering reflects the potential causes fromclosest to the problem to farthest from the problem.

In a preferred embodiment, the possible cause accumulator module 302receives the potential causes and ranking information from the rankaccumulator module 308 (preferably one at a time) and accumulates theinformation until ranking is complete and all the potential causesassociated with the problem are received. The possible cause accumulatormodule 302 interfaces with the distance estimator module 304, verifiermodule 306 and presentation module 204. The possible cause accumulatormodule 302 interfaces with the verifier module 306 to utilize thedatabases 210, 212 to check that the possible causes relate to theproblem (as described later). The possible cause accumulator module 302receives “distance” (i.e., the likelihood that a potential cause is theactual cause) information for each verified potential cause from thedistance estimator module 304. When verification is complete thepossible cause accumulator module 302 provides a list of potentialcauses and their distances to the presentation module 204.

In a preferred embodiment, the distance estimator module 304 calculatesthe distances associated with potential causes using input from theverifier module 306. The distance estimator module 304 provides thedistance information for each potential cause to the possible causeaccumulator module 302.

Distance and its determination are described as follows. Potentialcauses that are closer to the problem (i.e., having less “distance”between them and the problem) are preferably more likely to be actualcauses or contributors to the problem. The final calculated distance(for each potential cause) includes the initial rankings (i.e., aninitial measure of distance) adjusted appropriately by the distancesdetermined in the verification process. Proximity information determinedfrom the EDD 210 can be used to adjust the initial distances (rankings).

Proximity is an indication of how far away the equipment associated withthe particular policy change is from the problem or equipment, system orapplication on which the problem was noted or occurred. For example,proximity of routers in communications networks is typically given interms of “hops.” A router connected directly to another router is onehop from that router. A router connected to a router via a router inbetween them is two hops from that router. A router connected to arouter via two sequential routers in between them is three hops fromthat router, etc. A proximity indication can be easily extended. Forinstance, an application on a server is zero hops from that server. Anapplication on a server connected directly to a router is one hop fromthat server. An application on a server connected to another server viaan intervening router is two hops from the second server or anyapplication on the second server, etc.

Preferably, information in the EDD 210 contains descriptive data for theoverall policy controlled system, including data that identifies theelements that any particular element is connected to. Thus, this datacan readily be used to calculate distances. As an example, the distancesbetween elements in a communications network are readily calculated viamathematical methods used in standard routing algorithms such as openshortest path first (OSPF), as these routing algorithms typically mustdetermine distances in order to choose potential routes between elementscomprised of the lowest number of hops.

In the verification process for each policy change (potential cause),the EDD 210 data is used to calculated hop-type distances, termed“verification proximity distances,” between the associated element forwhich the policy change was made and the element on which the problemoccurred or was noted. For each verified potential cause, the originalranks (e.g., in the range of 0 to 100) can be adjusted via adding eachpotential cause's verification proximity distance that is normalized toa range of 0 to 100, to its original rank value, and then dividing theresult by two to maintain a final range of 0 to 100. In an example,final distance values are in the range of 0 to 100, where 0 is best(i.e., most likely to be an actual cause or contributor) and 100 isworst (i.e., least likely to be an actual cause or contributor).Verified potential causes with the lowest final distances will thus havethe highest final ranks, and verified potential cause with the highestfinal distances will have the lowest final ranks. A user/administratorwill be presented with the final rank ordered list of the verifiedpotential causes and the final distance value for each of them. In analternative embodiment, associated descriptive information could bepresented that is obtained during the verification process from the HVD212 via accumulating descriptive material from the various levels ofeach verified path, for review by the user/administrator. The preferredembodiments are not limited to described distance methods and otherdistance, ranking measurements and calculation methods can be employed.

In a preferred embodiment, the verifier module 306 accesses the EDD 210and HVD 212 to check each potential cause (as identified by the causeestimator module 314 and ranked by the rank estimator module 308) todetermine if that potential cause is actually a cause or contributor tothe problem or if it is more likely to be unrelated. The verifier module306 also preferably provides additional ranking-related data (e.g., rawhop distance information obtained from the EDD 210) to the distanceestimator module 304 so that final distances can be determined for eachverified potential cause.

The verification process is described as follows. In an example,verification is performed to check that the ranked potential causes arereasonable and not just coincidental policy changes and not related tothe problem described in the input process. Security and reliabilityvulnerabilities are determined by using the EDD 210 and HVD 212 asdescribed in U.S. Pat. No. 7,237,266, entitled “Electronic Vulnerabilityand Reliability Assessment,” and incorporated by this reference herein.In a preferred embodiment the HVD 212 is configured with system elementsat a top level (i.e., general levels) and branches down to amultiplicity of specific security vulnerabilities as the bottom level(i.e., more specific levels). One of the main problem parameters is theelement or elements in which the problem occurred or was noted, and thisprovides a starting point at the top level of the HVD 212 forverification. In an example, the potential causes which need to beverified, correspond to at least a partial policy configuration thatprovides information for the middle level of the HVD 212 which can beaugmented by additional policy configuration information obtained by thepolicy interpreter module 316 if necessary to make progress down throughthe database. The problem itself is a set of symptoms or “observables”associated with one or more specific security vulnerabilities located atthe bottom levels of the HVD 212.

In a preferred embodiment, the verification process is performed by theverifier module 306 utilizing the processes disclosed in U.S. Pat. No.7,237,266, entitled “Electronic Vulnerability and ReliabilityAssessment,” to proceed down through the HVD 212 for each potentialcause. Each potential cause, together with additional policyinformation, if needed, provides the policy information that controlsthe branching downward through the levels of the HVD 212, and thus formsa path from the top level starting point to some point on a bottomlevel. If this path reaches a bottom most destination matching orapproximately matching the identified problem, then the respectivepotential cause is considered verified. Otherwise, the potential causeis not verified.

In a preferred embodiment, the policy interpreter module 316 providesaspects and details of the overall system's policy for the system underreview to the verifier module 306 as needed in the verification process,interfacing with the overall system's policy management capability to doso. In an alternative embodiment, the policy interpreter module 316utilizes a complete set of policy information via the described inputparser/filter input process as described in U.S. Pat. No. 7,237,266,entitled “Electronic Vulnerability and Reliability Assessment.”

FIG. 4 is a block diagram of an illustrative example of a preferredembodiment of a hierarchical vulnerability database structure (HVDstructure) 400 of a system for automated diagnosis of security andreliability problems for electronic profile or policy enabled systems.The HVD structure 400 includes a plurality of database pages such aspage 402. The database page 402 includes a page index 404, a datasection 406 and a selector selection 408. The database pages 402 mayalso be referred to as entries or forms. The page index 404 preferablyincludes an index number and a descriptive title. In a preferredembodiment, the page index 404 is utilized by the HVD structure 400 toretrieve the appropriate database page 402.

The data section 406 includes the actual information and dataaccumulated and presented to the user regarding details of theidentified vulnerability or reliability results. The selector section408 in a preferred embodiment includes one or more independent lines ofdata, and up and down links to related database pages. In a preferredembodiment, the selector section 408 includes one or more index numbersas a database link to any related pages and a matching field whichcontains a list of keywords, associated numeric ranges, etc., all ofwhich can be used in the matching process to select subsequent pages toaccess. Thus in a preferred embodiment, each independent line of theselector section contains one or more keywords plus one or more specificdatabase page link indices with which these keywords are specificallyassociated (as well as optional data such as related numeric ranges foralternate or advanced matching/filtering). In an alternative embodiment,the selector section 408 includes an empty or “null” downward-pointingindicator line if the page is a “bottom page.”

In the illustrative example shown in FIG. 4, a cycle typically begins atthe top database page 402. In an example, the database page 402 containsmostly selector section information. The database pages at level 410each include selector section 408 information, however, the amount ofsolution data in the data section 406 is increasing. At level 412, thedatabase pages include less selector section 408 information and moresolution data in the data section 406. At level 414, the database pagesinclude more detailed solution data in the data section 406 and verylittle information in the selector section 408. Level 416 shows thebottom of the HVD structure 400 for the illustrative example. Databasepage 420 includes a null section 422 indicating that this page is thebottom page. The bottom database page 420 does not include a downwardpointing selector information and thus, a cycle stops at this pageunless the cycle was previously stopped.

As shown in FIG. 4, the database pages are preferably organized in ahierarchical structure. For example, a cycle or search typically beginsat HVD page 402. The selector section 408 of this page 402 provideslinks to a number of related pages. In an example, only one page, forexample, HVD page 411 contains relevant information. Another cycle basedon keywords identified in HVD page 411 uncovers links to the next levelof HVD pages with HVD page 413 providing relevant information. Anothercycle based on keywords identified in HVD page 413 reveals a link to HVDpage 415. Another cycle based on keywords identified in HVD page 415reveals a link to HVD page 420. In this example, HVD page 420 is thebottom page, as indicated by the null 422 reference, and thus nodownward pointing selector information is available and the cycle ends.

FIGS. 5A and 5B are flowcharts depicting functionality, in accordancewith one preferred embodiment, of an implementation of an example ofutilizing an automated diagnosis of security and reliability problemsfor electronic profile or policy enabled systems. Referring to FIG. 5A,the process begins at 502. In an example, a user has discovered that aserver of a policy controlled system is found to have been recentlycompromised via a hacker attack. Key files for example administrativelog files on the server of the system have been altered. Further, thehacker apparently added a new user account in order to facilitatefurther unauthorized use. At 504, problem input is obtained. In apreferred embodiment, the problem is input either by the user,administrator or automatically (e.g., by a management system). At 506,an adaptive logger is searched for potential related changes byconsulting its log memory, in which time-stamped policy/configurationchanges for the network/system are preferably stored as they occur(e.g., obtained via the network management system). In an example, theadaptive logger identifies three recent policy (i.e., configuration)changes that fall within the configurable parameter limits. The firstchange is on the server, and is essentially a softwareapplication/package upgrade that happens to include an old version of asecure shell software (SSH). For instance, in some typical cases orexamples software upgrades include some components that are older(rather than newer) than a component already installed. The secondchange identified is also on the server but is a minor change to agraphical user interface (GUI). The third change is to an adjacentrouter, but is also minor.

At 508, the three potential causes are ranked in a preliminary fashion.All three changes are potential causes of the problem that requireverification to determine if any of them could be actual causes of theproblem. The first and second changes are highest ranked because theyare on the server itself. The second change is ranked slightly higherthan the first change since it occurred only two days prior to the timethe problem was noted, in contrast to the first change that occurred aweek prior. The third change is lower ranked because it occurred twoweeks earlier and is located on an element a hop away, rather than beingon the same element as the problem. In addition, the third change is ona router, which via the configurable rank adjustment rules is identifiedas less security sensitive (in cases of security problems noted on aserver).

At 510, if the problem description is completed by a problem accumulatormodule, then at 512, parameters of the adaptive logger are modified asneeded. If the problem description is not finished, then the probleminput process continues at 504. At 514, recording focus and detail areadjusted (e.g., in this example, the adaptive logger will begin tolog/record policy/configuration changes that occur on servers androuters with greater detail, at least for a configurable temporary timeperiod). The process continues on FIG. 5B.

Referring to FIG. 5B, at 516, the ranked potential causes are testedutilizing database cycling in order to verify that they may be actualcauses of the problem. In a preferred embodiment, the potential causesare verified by utilizing information contained in the EDD 210 and HVD212 and a process as described in U.S. Pat. No. 7,237,266, entitled“Electronic Vulnerability and Reliability Assessment.” At 518, thedistances are calculated. At 520, a determination is made by a distanceestimator module as to whether or not threshold levels set by the useror administrator are violated. If yes, at 522, the rankings in violationare discarded or decreased. In some embodiments, potential causes can bediscarded using a configurable distance threshold set by the user suchthat any possible causes with distances that exceed this threshold valueare deemed exceedingly unlikely to be actual causes, and therefore canbe eliminated and subsequently ignored. In this example, theverification process verifies only the first and second potentialcauses. The third potential cause is discarded since it is determined tonot be able to cause any vulnerability corresponding to the notedproblem, and thus is not verified. The second change results in a verylarge verification distance, which subtracts greatly from its ranking,although it is not large enough to cause it be discarded (i.e., itsdistance does not exceed the configurable threshold). The first changehas a very small verification distance, which negligibly subtracts fromits high ranking. If the threshold is not violated for one or morepotential causes, at 524, a finalized ordered list of likely causes isprepared. For example, utilizing information from the EDD 210,preliminary rankings are adjusted to form the final rankings (e.g.,final distances).

At 526, results are output and presented to a user or administrator. Inan example, the final set of potential causes is presented as a smallerset of possible causes with an appropriate ordered format (e.g., mostlikely to least likely cause or contributor to the problem). In analternative embodiment, probabilities that the possible causes are thetrue causes can be calculated from the final distances and presented touser (e.g., when reference point probability-distance pair can bedetermined). In an example, the user examining the output for theproblem under examination observes that the software upgrade included aspecific old version of SSH, which has known vulnerabilities that allowa hacker to obtain full access to the server on which it is installed.The vulnerability has been exploited widely. Therefore, this change isthe likely true cause of the problem and should be corrected byreplacing the older vulnerable SSH version with the newest SSH versionavailable (which has been patched such that it is no longer vulnerableto this exploit). Based on the output information, the user chooses toend the examination, and the process ends at 528.

It should be emphasized that the above-described embodiments of thepresent invention, particularly, any “preferred” embodiments, are merelypossible examples of implementations, merely set forth for a clearunderstanding of the principles of the invention. Many variations andmodifications may be made to the above-described embodiment(s) of theinvention without departing substantially from the spirit and principlesof the invention. All such modifications and variations are intended tobe included herein within the scope of this disclosure and the presentinvention and protected by the following claims.

1. A method for providing automated diagnosis of problems for an electronic system, comprising: identifying recent configuration changes made to the electronic system that fall within pre-established parameters; ranking the identified changes into potential causes; verifying ranked potential causes to determine whether any of the ranked potential causes may be an actual cause or contributor to the problem; and calculating distances associated with the ranked potential causes that correspond to a relative likelihood that potential causes may be a true cause. 